Accurate detection of atrial fibrillation events with R-R intervals from ECG signals

Atrial fibrillation (AF) is a typical category of arrhythmia. Clinical diagnosis of AF is based on the detection of abnormal R-R intervals (RRIs) with an electrocardiogram (ECG). Previous studies considered this detection problem as a classification problem and focused on extracting a number of features. In this study we demonstrate that instead of using any specific numerical characteristic as the input feature, the probability density of RRIs from ECG conserves comprehensive statistical information; hence, is a natural and efficient input feature for AF detection. Incorporated with a support vector machine as the classifier, results on the MIT-BIH database indicates that the proposed method is a simple and accurate approach for AF detection in terms of accuracy, sensitivity, and specificity.


Introduction
Atrial fibrillation (AF or AFIB) is a type of abnormal heart rhythm (arrhythmia) characterized by the rapid, irregular beating of the heart's upper chambers, resulting in the pooling and clotting of blood inside the heart, thereby increasing the risk of heart attack, failure, and stroke [1]. The symptoms of AF frequently begin with short periods of arrhythmia, such as abnormal beating or atrial flutter, followed by longer arrhythmia periods, sometimes even lasting for hours, accompanied occasionally with heart palpitations, fainting, lightheadedness, shortness of breath, or chest pain [2].
The clinical diagnosis of AF is based on the surface electrocardiogram (ECG), and because of the disorganized electrical activity, AF is characterised by the absence of a P wave. However, because the amplitude of the P wave is relatively low (also a heavy baseline), making its detection difficult, the R-R interval (RRI), which reflects the ventricular interbeat, was proposed as a significant biomarker for AF detection [3]. Compared with RRI in regular rhythm segments, consecutive RRIs during AF episodes exhibit low averages and high fluctuations, reflecting rapid and irregular heart beating. Fig 1 illustrates a typical ECG record (04043) from the MIT-BIH atrial fibrillation database (AFDB) [4,5], which demonstrates the different patterns of RRIs (red line) in and off AF segments. Since AF episodes duration may change from a few seconds to hours, the chance of AF detection depends heavily on the monitoring period of the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Table 1 lists 35 published studies that have been conducted to develop efficient AF detection methods. Tateno and Glass [3] first noticed the increase in the variation on AF episodes and hence proposed the coefficient of variation as a feature of RRI and ΔRRI (the first-order difference of RRI). Subsequently, they used statistical hypothesis testing to verify the existence of AF  events. They also proposed the use of the Kolmogorov-Smirnov test to compare the histograms of AF RRIs and normal one. Many subsequent studies considered this detection problem as a classification problem and focused on the extraction of various features and the design of classifiers. These features include entropy [9][10][11][12][13], mean and/or median (with or without normalization), root mean square and/or variance [14][15][16], quantiles [16,17], median absolute deviation [10,16,17], coefficients of wavelet transformation [12,13], Markov score [18] of RRI and/or ΔRRI, or a combination of several features [10,11,16,19,20]. In recent studies, deep learning algorithms such as long short-term memory (LSTM) [21,22], and others [20,[23][24][25] have been used to process original signals without feature extraction. In this study, from a statistical perspective, we consider that instead of employing any numerical characteristic (i.e., mean, variance, skewness, etc.) as a specific feature, the probability density function conserves comprehensive information and hence enables high-performance classification. Consequently, we propose the use of a histogram of the RRI from an ECG as a natural and general feature and the widely used support vector machine (SVM) as the classifier.

Databases
This study employed the MIT-BIH atrial fibrillation database (AFDB) [4,5], which is widely used in arrhythmia studies. This database includes 25 records of human subjects with AF, and each record includes two-channel ECG signals with a sample frequency of 250 Hz and a 12-bit A/D resolution. Furthermore, this database contains clinical annotations and QRS calls, and supports online retrieval with the easy-to-use toolbox waveform database (WFDB) [43,44]. Note that R waves were already called by WFDB, so this paper do not cover the detection of R waves from an ECG signal. Researchers interested in this topic are referred to fruitful literature [45][46][47][48]. The MIT-BIH long-term atrial fibrillation database (LTAFDB) [49,50] was also employed as a positive test dataset, which includes 84 long-term (24 hours) ECG recordings, with the same sampling parameters as the AFDB.
As in normal control cases, to evaluate the specificity, this study also employed the MIT-BIH normal sinus rhythm database (NSRDB) [51,52], which includes long-term ECG records of 18 human subjects who exhibited no significant signs of arrhythmia.

Performance criteria
Three widely used criteria were employed to quantify AF detection performance: accuracy (ACC), sensitivity (SEN), and specificity (SPE).
SEN is referred to as the true positive rate, which is used to measure how well a method can identify real patients, and is defined as the proportion of true positives among all positive subjects.
SPE is referred to as the true negative rate, which is used to measure how well a method can identify a normal person and is defined as the proportion of true negatives among all negative subjects.
For diagnosis and screening, there exists a trade-off between SEN and SPE; therefore, ACC is commonly used to consider SEN and SPE integrally. ACC is defined as the proportion of the sum of true positives and true negatives among all the samples.
Notably, precision (PRE), also known as positive predictive value (PPV), is frequently employed in several studies and is defined as the proportion of true positives among all detected positive cases. However, among these four criteria (ACC, SEN, SPE, and PRE) only three are independent, and the fourth can be calculated depending on the other three (see S1 File). Therefore in our study, only ACC, SEN, and SPE were evaluated, and PRE was not considered. Among the studies listed in Table 1, a few studies provided PRE, while others provided ACC; hence, we used the formulas mentioned in S1 File to convert among them, and the resultant values have been labelled with asterisks.

Data pre-processing.
After 127 records were downloaded, the following pre-processing steps were followed: 1. RRI values were re-scaled from the sample index to milliseconds by dividing with the sampling frequency; 2. Annotations and comments of AFDB and LTAFDB were resolved, and RRI regions with the string '(AFIB' were selected as positive regions; 3. All RRI regions of NSRDB were selected as negative regions; 4. Both positive and negative regions were cut to segments, each including 30 PPIs;

A histogram with M bins of each RRI segment was calculated, and stored in a row vector of size M;
6. The N 0 row vectors from NSRDB were cascaded vertically to form the negative sample matrix X 0 , and the same method was employed for vectors from AFDB and LTAFDB, yielding matrices X 1 and X 2 of height N 1 and N 2 , respectively.

Classifier.
Soft-margin support vector machine (SVM) [53,54] was trained as the classifier, which is formally defined as the following optimization problem (the Lagrangian dual form): where X ¼ ½x 1 ; x 2 ; . . . ; x N � 2 R N�M stores the N training samples, each sample x i 2 R 1�M is a row vector of length M; y 2 R N�1 stores the labels of samples (1 for AF, and -1 for normal); α 2 R N�1 is an unknown weight vector to be optimized; sum(α) is the sum of all elements in α; � is the point-wise multiplication (the Khatri-Rao product); c is a box constraint parameter, which controls the strength of regularization; K X ¼ ½k ij � 2 R N�N is the kernel matrix of X, with element k ij = κ(x i , x j ) is the Gaussian kernel (or radial basis function): where σ is a scale parameter.
To optimize problem (1), sequential minimal optimization (SMO) [55] was utilized as the solver. Once α is obtained, the bias parameter b can be calculated as: For a test PPI vector t, the predicting function read: and if p(t) > 0, an AF event is detected.

Kernel function
First, we compared the performance of the Gaussian kernel function (2) with that of the linear kernel function k l ðx i ; x j Þ ¼ x i x T j =s 2 , and the third-order polynomial kernel function At this step, all other parameters were set to default values (scale parameter σ = 1, box constraint parameter c = 1, and number of bins M = 10). The SVM classifier was trained with the positive and negative sample matrices X 1 and X 0 by constructing a kernel matrix K X of size (N 1 + N 0 ) × (N 1 + N 0 ), and a training label y with N 1 ones and N 0 negative ones. The optimize problem (1) was solved with SMO solver to train the weight vector α; then, the bias parameter b was calculated based on Eq (3). Subsequently, the same samples were tested with the trained SVM, and the performance criteria were evaluated. Fig 3 illustrates the performance with different kernel functions. It was demonstrated that the radial basis function was the best, and this kernel was chosen in the sequel.

Number of histogram bins
Subsequently, we tested the impact of the histogram bin number M on detection performance. Because approximately 99% of the RRI values lie within the region of 50 ms and 2e3 ms, the centres of the first and last bins were set to 50 ms and 2e3 ms, respectively. Other M − 2 bin centres were located linearly within this region. RRI values beyond this region were assigned to either the first or the last bin.
The training of SVM and performance evaluation were the same as in the previous experiment. Fig 4 demonstrates the results, which indicate that the detection performance increases with an increase in M and reaches the ceiling at 30. Therefore, in the following experiments, M was fixed at 30.

Cross-validation with scale and box constraint parameters
The scale parameter σ and box constraint parameter impacted the training significantly; hence, we used ten-fold cross-validation to optimize these two parameters. Both σ and c were sampled on a two-dimensional logarithmic grid. The training of SVM, performance evaluation, and training dateset were the same as in the previous experiment. Fig 5 demonstrates the results in which panel (b) indicates that a high SEN performance requires large-scale and box constraint parameters, and panel (c) indicates that a high SPE performance requires small-scale and box constraint parameters. As mentioned, ACC considers both SEN and SPE, and the best performance was achieved at σ = 3.2, and c = 1 (the yellow star in panel (a)). Table 2 lists the ten-fold cross-validation performance in this setting.

Independent dataset testing
In the last experiment, the SVM model was trained with AFDB (N 1 = 16817) and NSRDB (N 0 = 58742) as positive and negative samples, respectively. Model parameters were set according to the results of previous experiments. Subsequently, LTAFDB (N 2 = 101376) was used as the independent positive testing dataset. The confusion matrix is shown in Table 3, and detection results for ACC, SEN, and SPE were 0.9697, 0.9524, and 0.9994, respectively, thus indicating a good generalization performance.

Conclusion and discussion
We conclude that an accurate detection method for atrial fibrillation events based on the RR interval measured from an ECG signal was proposed in this paper. The advantage of the proposed method over the methods described in literature is that: instead of using any specific numerical characteristic (e.g., entropy, mean, median, root mean square, variance, quantiles, etc. or a combination of several characteristics) as the input feature, the probability density conserves all statistical information; hence, is natural, comprehensive, easy-computing and efficient as the input features. On the MIT-BIH databases, the proposed method achieved 0.9843±0.0019, 0.9848±0.0029, and 0.9840±0.0024, in terms of ACC, SEN, and SPE, respectively, for a ten-fold cross-validation, and 0.9697, 0.9524, and 0.9994, respectively, for an independent testing, indicating that the proposed method is effective in AF detection. Note that some studies highlighted the difference between the histograms of PPIs of AF and normal one, and proposed the use of a histogram to detect AF, but the manner in which they utilize histograms is quite different from that in this study. For example, Tateno and Glass [3] observed an increase in variation in AF episodes, and proposed using the Kolmogorov-Smirnov test to compare the histogram of AF RRI and normal RRI; Petrucci et al. [26] calculated several statistics, such as the distribution width based on the histogram of RRI prematurity and ΔRRI, and used a geometric test to detect AF. Alternatively, this study proposes using the histogram as the feature vector and an input to the support vector machine for classification, which is the main contribution of this study.